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Abstract. In this paper we are concerned with the solution of Compressed Sensing (CS) prob- 
lems where the signals to be recovered are sparse in coherent and redundant dictionaries. We extend 
the primal-dual Newton Conjugate Gradients (pdNCG) method in |11| for CS problems. We provide 
an inexpensive and provably effective preconditioning technique for linear systems using pdNCG. 
Numerical results are presented on CS problems which demonstrate the performance of pdNCG with 
the proposed preconditioner compared to state-of-the-art existing solvers. 
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1. Introduction. CS is concerned with recovering a signal x G M n by observing 
a linear combination of the signal 

b = Ax, 

where A G M mxn is an under-determined linear operator with m < n and b G M m 
are the observed measurements. Although this system has infinitely many solutions, 
reconstruction of x is possible due to its assumed properties. In particular, x is 
assumed to have a sparse image through a coherent and redundant dictionary W G 
E nxl , where E = M or C and n < 1. More precisely, W*x, is sparse, i.e. it has only few 
non-zero components, where the star superscript denotes the conjugate transpose. If 
W*x is sparse, under certain conditions on matrices A and W (discussed in Subsection 
1.2) the optimal solution of the linear problem 


minimize ||VF*x||i, subject to: Ax = b 


is x, where || • ||i is the iq-norm. 

Frequently measurements b might be contaminated with noise, i.e. one measures 
b = b -Ye instead, where e is a vector of noise, usually modelled as Gaussian with 
zero-mean and bounded Euclidean norm. In addition, in realistic applications, W*x 
might not be exactly sparse, but its mass might be concentrated only on few of its 
components, while the rest are rapidly decaying. In this case, (again under certain 
conditions on matrices A and W) the optimal solution of the following problem 

(1.1) minimize f c (x) := c||IF*x||i + — b\\ 

is proved to be a good approximation to i In <o>, c is an a-priori chosen positive 
scalar and || • ||2 is the Euclidean norm. 
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1.1. Brief description of CS applications. An example of W being redun¬ 
dant and coherent with orthonormal rows is the curvelet frame where an image is 
assumed to have an approximately sparse representation [8]. Moreover, for radar and 
sonar systems it is frequent that Gabor frames are used in order to reconstruct pulse 
trains from CS measurements |21j. For more applications a small survey is given in 
[9]. Isotropic Total-Variation (iTV) is another application of CS, which exploits the 
fact that digital images frequently have slowly varying pixels, except along edges. 
This property implies that digital images with respect to the discrete nabla operator, 
i.e. local differences of pixels, are approximately sparse. For iTV applications, matrix 
W G C nxn is square, complex and rank-deficient with rank(W ) = n — 1. An alterna¬ 
tive to iTV is G-analysis, where matrix W is a Haar wavelet transform. However, a 
more pleasant to the eye reconstruction result is obtained by solving the iTV problem 
compared to the G-analysis problem, see [23]. 

1.2. Conditions and properties of CS matrices. There has been an ex¬ 
tensive amount of literature studying conditions and properties of matrices A and 
W which guarantee recoverability of a good approximation of x by solving problem 
0- For a thorough analysis we refer the reader to mmm- The previously cited 
papers use a version of the well-known Restricted Isometry Property (RIP) [9], which 
is repeated below. 

Definition 1.1. The restricted isometry constant of a matrix A G M mxn adapted 
to W G E nxl is defined as the smallest S q such that 

(1 - S q )\\Wz\\l < \\AWz\\l < (1 + S q )\\Wzf 2 


for all at most q-sparse z G E l , where E = M or C. 

For the rest of the paper we will refer to Definition 1.1 as W-RIP. It is proved 
in Theorem 1.4 in [9; that if W G E nxl has orthonormal rows with n < l and if A, 
W satisfy the W-RIP with S 2q < 8.0e-2, then the solution x c obtained by solving 
problem 0 satisfies 


( 1 . 2 ) 


Ike - x \\ 2 = Co||e|| 2 + Cl 


\\W*x c -(W*x) q h 

Vq 


where ( W*x) q is the best g-sparse approximation of TV*x, Co and C\ are small con¬ 
stants and only depend on It is clear that W*x must have l — q rapidly decaying 
components, in order for \\x c — x\\ 2 to be small and the reconstruction to be successful. 

iTV is a special case of G-analysis where matrix W does not have orthonormal 
rows, hence, result (1.2) does not hold. For iTV there are no conditions on S 2q such 
that a good reconstruction is assured. However, there exist results which directly 
impose restrictions on the number of measurements m, see Theorems 2, 5 and 6 in 123] . 
Briefly, in these theorems it is mentioned that if m > q log(n) linear measurements are 
acquired for which matrices A and W satisfy the W-RIP for some S q < 1, then, similar 
reconstruction guarantees as in (1.2) are obtained for iTV. Based on the previously 
mentioned results regarding reconstruction guarantees it is natural to assume that 
for iTV a similar condition applies, i.e. ^ < 1/2. Hence, we make the following 
assumption. 

assumption 1.2. The number of nonzero components ofW*x c , denoted by q, 
and the dimensions l, m, n are such that matrices A and W satisfy W-RIP for some 
$ 2 q <1/2. This assumption will be used in the spectral analysis of our preconditioner 
in Section [U 
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Another property of matrix A is the near orthogonality of its rows. Indeed many 
applications in CS use matrices A that satisfy 

(1-3) ||^4 T -im||2<£, 

with a small constant S > 0. Finally, through the paper we will make use of the 
following assumption 


(1.4) 


Ker(lV) n Ker(A) = {0}, 


This is commonly used assumption in the literature, see for example [26] . 

1.3. Contribution. In m , Chan, Golub and Mulet, proposed a primal-dual 
Newton Conjugate Gradients method for image denoising and deblurring problems. 
In this paper we modify their method and adapt it for CS problems. There are two 
major contributions. 

First, we propose an inexpensive preconditioner for fast solution of systems us¬ 
ing pdNCG when applied to CS problems with coherent and redundant dictionaries. 
The proposed preconditioner is a generalization of the preconditioner in [16] for CS 
problems with incoherent dictionaries. We analyze the limiting behaviour of our pre¬ 
conditioner and prove that the eigenvalues of the preconditioned matrices are clustered 
around one. This is an essential property that guarantees that only a few iterations 
of CG will be needed to approximately solve the linear systems. Moreover, we pro¬ 
vide computational evidence that the preconditioner works well not only close to the 
solution (as predicted by its spectral analysis) but also in earlier iterations of pdNCG. 

Second, we demonstrate that despite being a second-order method, pdNCG can 
be more efficient than specialized first-order methods for CS problems of our interest, 
even on large-scale instances. This performance is observed in several numerical ex¬ 
periments presented in this paper. We believe that the reason for this is that pdNCG, 
as a second-order method, captures the curvature of the problems, which results in 
sufficient decrease in the number of iterations compared to first-order methods. This 
advantage comes with the computational cost of having to solve a linear system at 
every iteration. However, inexact solution of the linear systems using CG combined 
with the proposed efficient preconditioner crucially reduces the computational costs 
per iteration. 


1.4. Format of the paper and notation. The paper is organized as follows. 
In Section [2] problem (1.1) is replaced by a smooth approximation; the G-norm is 
approximated by the pseudo-Huber function. Derivation of pseudo-Huber function is 
discussed and its derivatives are calculated. In Section [3] a primal-dual reformulation 
of the approximation to problem <H3 and its optimality conditions are obtained. 
In Section [4] pdNCG is presented. For convergence analysis of pdNCG method the 
reader is referred to mm- In Section [5| a preconditioning technique is described 
for controlling the spectrum of matrices in systems which arise. In Section [6j a 
continuation framework for pdNCG is described. In Section [7] numerical experiments 
are discussed that present the efficiency of pdNCG. Finally, in Section [8] conclusions 
are made. 

Throughout the paper, || • ||i is the G-norm, || • ||2 is the Euclidean norm, || • ||oo 
the infinity norm and | • | is the absolute value. The functions Re(-) and /ra(-) take a 
complex input and return its real and imaginary part, respectively. For simplification 
of notation, occasionally we will use Re(-) and /m(-) without the parenthesis. Fur¬ 
thermore, diag(-) denotes the function which takes as input a vector and outputs a 
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diagonal square matrix with the vector in the main diagonal. Finally, the super index 
c denotes the complementarity set, i.e. B c is the complement set of B. 

2. Regularization by pseudo-Huber. In pdNCG [TT] the non-differentiability 
of the ^i-norm is treated by applying smoothing. In particular, the £i-norm is replaced 
with the pseudo-Huber function [18] 

i 

(2.1) MWx) := £((M 2 + \W*x\ 2 f* - fi), 

i =1 


where Wi is the i th column of matrix W G E nxl and g controls the quality of ap¬ 
proximation, i.e. for (i 0, Vv( x ) t en ds to the ^i-norm. The original problem <H3 
is approximated by 


( 2 . 2 ) 


minimize fc(x) := cip /Jb (W*x) H —\\Ax — b |||. 


2.1. Derivation of pseudo-Huber function. The pseudo-Huber function (2.1) 
can be derived in a few simple steps. First, we re-write function ||fF*x||i in its dual 
form 


(2.3) IIW^Hi = sup Re(g*W*)x , 

9^ l ,\\g\\oo<^ 

where g are dual variables. The pseudo-Huber function is obtained by regularizing 
the previous dual form 


(2.4) r ip fl (W*x ) = sup 
sechiisiio, 


<i 


r )x + £ ( M (i - - m) 


i= 1 


where gi is the i th component of vector g. 

An approach which consists of smoothing a non-smooth function through its 
dual form is known as Moreau’s proximal smoothing technique [22] . Another way 
to smooth function ||VF*x||i is to regularize its dual form with a strongly convex 
quadratic function /x/2||^|||- Such an approach provides a smooth approximation of 
||IF* x ||i which is known as Huber function and it has been used in [4]. Generalizations 
of the Moreau proximal smoothing technique can be found in [24] and [3]. 


2.2. Derivatives of pseudo-Huber function. The gradient of pseudo-Huber 
function ^^(W^x) in (2.1) is given by 


VVv(VF*x) = Re(WDW*)x , 


where D := diag(D\, T> 2 , • • • , D\) with 


(2.5) 


Di := (/j , 2 + \yi\ 2 ) 2 Vi = 1,2, • • • , 1 , 


and y = [yi,y 2 , • • • ,yi] T := W*x. The gradient of function f£{x) in (2.2) is 


V/£(a;) = cV^(W*x) + A 1 (Ax - b ). 


The Hessian matrix of 'ip^(x) is 

(2.6) V 2 ip^W*x) := ^(WYW* + WYW* + WYW* + WYW*), 
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where the bar symbol denotes the complex conjugate, Y := diag Yi, Y 2 ,..., Yi , Y : 


diag 

(2.7) 


Y u Y 2 ,...,Yi and 


Yi-^Df + Di, Yi := —y 2 Df, i = l,2,...,i, 


Moreover, the Hessian matrix of f{?{x) is 

(2.8) V 2 /£(®) = cV 2 tp ll {W*x) + A T A. 

3. Primal-dual formulation and optimality conditions. In [10] the authors 
solved iTV problems for square and full-rank matrices A which were inexpensively di- 
agonalizable, i.e. image deblurring or denoising. More precisely, in the previous cited 
paper the authors tackled iTV problems using a Newton-CG method to solve problem 


(2.2). They observed that close to the points of non-smoothness of the £i-norm, the 
smooth pseudo-Huber function (2.1) exhibited an ill-conditioning behaviour. This 
results in two major drawbacks of the application of Newton-CG. First, the linear al¬ 
gebra is challenging. Second, the region of convergence of Newton-CG is substantially 
shrunk. To deal with these problems they proposed to incorporate Newton-CG inside 
a continuation procedure on the parameters c and fi. Although they showed that con¬ 
tinuation did improve the global convergence properties of Newton-CG it was later 
discussed in [Tl] (for the same iTV problems) that continuation was difficult to con¬ 
trol (especially for small fi) and Newton-CG was not always convergent in reasonable 
CPU time. 

In El, Chan, Golub and Mulet provided numerical evidence that the behaviour 
of a Newton-CG method can be made significantly more robust even for small values 
of fi. This is achieved by simply solving a primal-dual reformulation of (2.2), which 
is given below 


1 1 

(3.1) minimize sup cRe{g*W*)x + f/i(l — Iffil 2 )^ 2 — g] + -1 |Ax - 

sec MM|oo<i i=1 v 72 


The reason that Newton-CG method is more robust when applied on problem (3.1) 


than on problem (2.2) is hidden in the linearization of the optimality conditions of 
the two problems. 


3.1. Optimality conditions. The optimality conditions of problem (2.2) are 


(3.2) VVv(lV*x) + A 1 {Ax -b) = cRe{WDW*)x + A 1 {Ax - b) = 0. 
The first-order optimality conditions of the primal-dual problem (|3.1|) are 


(3.3) 


cRe{Wg) + A J {Ax-b) = 0, 
D~ x g = W*x. 


Notice for conditions (3.3) that the constraint ||g||oo < 1 in (3.1) is redundant since 


any x and g that satisfy (3.3) also satisfy this constraint. Hence, the constraint has 
been dropped. Conditions (3.3) are obtained from ( |3.2| ) by simply setting g = DW*x. 
Hence, their only difference is the inversion of matrix D. However, this small difference 
affects crucially the performance of Newton-CG. 

The reason behind this is that the linearization of the second equation in ( |3.3| ), 
i.e. D~ x g = W*x, is of much better quality than the linearization of W^^iW^x) for 
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/i « 0 and W*x w 0. To see why this is true, observe that for small fi and W*x « 0, 
the gradient V'ip lJL (W*x) becomes close to singular and its linearization is expected to 
be inaccurate. On the other hand, D~ x g = W*x as a function of W*x is not singular 
for fi « 0 and W*x « 0, hence, its linearization is expected to be more accurate. We 
refer the reader to Section 3 of HU for empirical justification. 


4. Primal-dual Newton conjugate gradients method. In this section we 
present details of pdNCG method DU- 


4.1. The method. First, we convert optimality conditions ( |3.3| ) to the real 
case. This is done by splitting matrix W = ReW - 


and the dual variables 
g = g re + V—lgim into their real and imaginary parts. We do this in order to obtain 
optimality conditions which are differentiable in the classical sense of real analysis. 
This allows a straightforward application of pdNCG method m- The real optimality 
conditions of the primal-dual problem <0 are given below 


c(ReWg re + ImWgi rn ) + A 1 [Ax — b) = 0, 
D~ 1 g re = ReW J x, D~ l gi rn = ImW J x. 


At every iteration of pdNCG the primal-dual directions are calculated by approx¬ 
imate solving the following linearization of the equality constraints in (4.1) 


B Ax = -Vf£{x) 

(4.2) A g re = D(I — Bi)ReW T Ax + DB 2 lmW J Ax — g re + DReW J x 

A g im = D(I — B^)ImW J Ax + DB 3 ReW T Ax — g im + DImW J x 

where 


(4.3) 


B := cB + A J A, 


B := ReWD(I - Bi)ReW J + ImWD{I - B 4 )ImW T + ReWDB 2 ImW J 
+ ImWB 3 DReW\ 

and Bi,i = 1,2,3,4 are diagonal matrices with components 

[-^i]u - = [{jrc ] iWjx , [-E?2]u - = Di[g r( ]iImW^x, 

[B 3 ]u := Di\g im \iReW?x, [S 4 ]i» := Di\g irn \iImWjx. 


remark 4.1. Matrix B in (4.3) is positive definite (and invertible) if || g re + 
V~A-gim\\oo < 1 and condition (1.4) are satisfied. The former condition will be main¬ 
tained through all iterations of pdNCG. 


It is straightforward to show the claim in Remark |4.1| for the case of W being 
a real matrix. For the case of complex W we refer the reader to a similar claim 
which is made in HD. page 1970. Although matrix B is positive definite under the 
conditions stated in Remark |4.1[ it is not symmetric, except in the case that W is 
real where all imaginary parts are dropped. Therefore in the case of complex matrix 
W, preconditioned CG (PCG) cannot be employed to approximately solve (4.2). To 
avoid the problem of non-symmetric matrix B the authors in [Tl] suggested to ignore 
the non-symmetric part in matrix B and employ CG to solve (4.2). This idea is based 
on the following remark. 
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1: Input: Ti e (0,1), 72 e (0,1/2), x°, g° re and g° im , where \\g° re + V^g^Woo < 1- 
2: Loop: For k = 1,2,..., until termination criteria are met. 

3: Calculate A g k e and A g k m by solving approximately the system (4.4), 


4: 


5 : 


6 : 


until (4.6) is satisfied for some g G [0,1). 


Qp-j- «—i. ^.k I n k ~^~^ n k 

out t/ re . £/ re ay fe , . y irn 


Ag k m and calculate 


5 fe+1 :=A-II 


d< 1 (^re" 1 + 


where -P||.|| 00 <i(*) is the orthogonal projection on the £oo ball. 
Then set g*+ x := Reg k+1 and g k + x := Img k+1 . 

Find the least integer j > 0 such that 

f?(x k + r{Ax k ) < f?(x k ) + t 2 t/(V/ c ^)) t A^ 

and set a := r{. 

Set x /e+1 := + aAx k . 


Fig. 4.1: Algorithm primal-dual Newton Conjugate Gradients 


remark 4.2. The symmetric part of B tends to the symmetric second-order 
derivative of fjf{x) as pdNCG converges (see Section 5 in FIT]/). 

Hence, system ( |4.2| ) is replaced with 

BAx = -V/£ 0) 

(4.4) Ag re = £>(/ - Hi)i?elF T Ax + DB 2 ImW J Ax - g re + DReW 1 x 
A gim = D(I — B/fjlmW 1 Ax + DB^ReW 1 Ax — gi m + DImW T x 

where 

(4.5) B := csym(B)-\-A 1 A 

and sym(P) := 1/2 (B+B T ) is the symmetric part of B. Moreover, PCG is terminated 
when 


(4.6) 


\\B*x + Vf?(x)\\ 2 <r,\\Vfe(x)h, 


is satisfied for g G [0,1). Then the iterate g = g re + A g re + y/—l(gi m + A gim) is 
orthogonally projected on the box {x : WxW^ < 1}. The projection operator for 
complex arguments is applied component-wise and it is defined as u := P||.|| oo < 1 (r) = 
min(l/|u|,l) © r, where © denotes the component-wise multiplication. In the last 
step, line-search is employed for the primal Ax direction in order to guarantee that 
the objective value f{f(x) is monotonically decreasing, see Section 5 of [TT] . The 
pseudo-code of pdNCG is presented in Figure [471] 


5. Preconditioning. Practical computational efficiency of pdNCG applied to 
system (4.4) depends on spectral properties of matrix B in (4.5). Those can be 
improved by a suitable preconditioning. In this section we introduce a new precon¬ 
ditioner for B and discuss the limiting behaviour of the spectrum of preconditioned 
B. 
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First, we give an intuitive analysis on the construction of the proposed precon¬ 
ditioner. In Remark IA.2I it is mentioned that the distance uj of the two solutions 
x c := argmin f c (x) and x c ^ := argmin/^(x) can be arbitrarily small for sufficiently 
small values of fi. Moreover, according to Assumption |1.2[ W*x c is q sparse. There¬ 
fore, Remark |A.2| implies that W*x c ^ is approximately q sparse with nearly zero 


components of O(uo). A consequence of the previous statement is that the compo¬ 
nents of W*x c split into the following disjoint sets 

B := {i e {1,2, •• • ,1} | \W*x c ^\ » £>(w)}, \B\ = q = |supp(IT*x c )|, 


B c := {*e{l,2,--. ,1} | | W*x c J & 


\B c \=l-q. 


The behaviour of W*x C4 , : has a crucial influence on matrix V 2 'iJj ii (W* x C4l ) in (2.6). 
Notice that the components of the diagonal matrix D, defined in ( ]2.5[ ) as part of 
V 1, ip ll iW*x C4 j), split into two disjoint sets. In particular, q components are non-zeros 
much less than 0(l/u>), while the majority, l — q, of its components are of 0(1/uj), 


(5.1) 


Di « O(-) Vi e B and A = O(-) Vi e B c 

UJ UJ 


Hence, for points close to x c ^ and small /q matrix V 2 /^(x) in ( |2.8[ ) consists of a 
dominant matrix cV 2 Vv( x ) an d °f ma t r i x A 1 A with moderate largest eigenvalue. The 
previous argument for A 1 A is due to <0- Observe that A m ax(A T A) = X max (AA T ), 
hence, if (5 injl.3) is not a very large constant, then X max (A T A) < 1 + S. According 
to Remark 4.2, the symmetric matrix sym (B) in (2.8) tends to matrix V 2 Vyt( x ) as 


x x c Therefore, matrix sym (B) is the dominant matrix in B. For this reason, in 


the proposed preconditioning technique, matrix A T A in (2.8) is replaced by a scaled 


identity p/ n , p > 0, while the dominant matrix sym(H) is maintained. Based on these 
observations we propose the following preconditioner 


(5.2) 


N := csym(H) + pl n 


In order to capture the approximate separability of the diagonal components of 
matrix D for points close to when /i is sufficiently small, we will work with 

approximate guess of B and B c . For this reason, we introduce the positive constant 
z/, such that 


#(A <v)=G. 

Here a might be different from the sparsity of W*x c . Furthermore, according to the 
above definition we have the sets 


(5.3) 


B u := {i e {1, 2, • • * , /} I Di < v} and B c v := {1, 2, • • • , 1}\B U 


with \B V \ — a and \B C V \ = l —a. This notation is being used in the following theorem, in 
which we analyze the behaviour of the spectral properties of preconditioned V 2 /^ (x), 
with preconditioner N := c\/ 2r ^^(W*x) + pl n . However, according to Remark 4.2 


matrices B and N tend to V 2 /^(x) and TV, respectively, as x —)> x cjjLt . Therefore, the 
following theorem is useful for the analysis of the limiting behaviour of the spectrum 
of preconditioned B. 
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Theorem 5.1. Let v be any positive constant and #(Di < u) = a at a point x, 
where D is defined in (|2.5|). Let 


V 2 / c ^O) = cV 2 ^(W*x) + A 1 A and N := cV 2 ^{W*x) + pl n . 

Additionally, let A and W satisfy W-RIP with some constant S a < 1/2 and let A 
satisfy (1.3) for some constant 6 > 0. 

If the eigenvectors of N~^X/ 2 f^(x)N~^ do not belong in Ker{W^ c ) and p G [S a , 1/2], 
then the eigenvalues of 7V -1 V 2 f^(x) satisfy 

| A _ 1 | < 1 X + 1 + (5x 2 - 2x + 1)* 


2c^\ rnin {Re{W B cW^ c ))Pp 1 


where A G spec(N 1 V 2 /^(x)) ; A mi n {Re(Ws^W^ c )) is the minimum nonzero eigen¬ 
value of Re(Wi 3 c Wg c ) and x := 1 + 5 — p. 

If the eigenvectors of N~^X7 2 f^(x)N~i belong in Ker(Wg c ), then 
| A _ 1 | < 1 X + 1 + (5x 2 - 2x+ 1)^ 

“2 p 

Proof. We analyze the spectrum of matrix N~^\/ 2 f^(x)N~^ instead, because it 
has the same eigenvalues as matrix N _1 X7 2 fjf(x). We have that 

N-^V 2 f?(x)N~^ = iV-5(cV 2 f/v(a:) + A T A)N~* 


= AT-5 (cVV„(aO + A t A + pl n - pI n )N~* 


-l 


= jV“5(cVWv(z) + pI n )N~2 + N~2 A t AN~ 2 - pN 
= I n + N~^ A 1 AN~^ - pN~ l 

Let u be an eigenvector of N~i\/ 2 flf(x)N~^ with ||u|| 2 = 1 and A the corresponding 
eigenvalue, then 

(I n + N-*ATAN-v -pN~ 1 )u = Xu ^ 

(N + N^A T AN~^ - pl„)u = XNu => 

u T Ni(A T A-pI n )N~iu= (X-l)u T Nu =>• 

(5.4) \u T N?(A T A - pI n )N~*u\ = |A - l|u T iVu. 

First, we find an upper bound for \u T Ni(A T A — pI n )N~^u\. Matrices N^(A T A — 
pI n )N~* and A 1 A — pl n have the same eigenvalues. Therefore, 

|u T JV*(A T A - pI n )N~iu | < A+ ox (AM - pl n ) 

where A+ aa .(-) is the largest eigenvalue of the input matrix in absolute value. Thus, 

\u J N^ (A 1 A — pI n )N~iu\ < max \v J (A 1 A — pl n )v\ 

= „ „ 2 m ,^ „2 \( PV + Q V ) J ( ATA ~ P J n)(Pv + Qv)\, 
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where P is the projection matrix to the column space of and Q = I n — P. Using 
triangular inequality we get 

I vJNv(A*A-pI n )N-*u\< max (|(Pu) T (A T A - pI n )Pv\ 

\\Pv\\l+\\Qv\\l<i 

+ \(QvY(A^A - pI n )Qv\ + 2|(Pv)T(AM - pI n )Qv |). 

Let us denote by v the solution of this maximization problem and set HP# Hi = ot and 
||Q#||i = 1 — a, where a G [0,1], then 


r Ni(A T A~ pI n )N~iu\ < (|(P#) T (W A - pI„)Pv\ 


(5.5) 


\(Qvy(A^A - P I n )Qv | + 2\(Pv) J (A 1 A - pI n )Qv |). 


Since Pv belongs to the column space of Wb„ and B„ = a, from W-RIP with 
5 a < 1/2 we have that 

\\Pv\\l{l-5 a )<\\APv\\l =» 

\\Pv\\l(l-p)<\\APv\\l ^ 

\\Pv\\l{l-2p)<\\APv\\l-p\\Pv\\l 

Since p € [<5^-,1/2] we have that 11 J F > £>11§ < which implies that if the eigen¬ 

vector corresponding to an eigenvalue of matrix A T A belongs to the column space of 
Wb„ , then the eigenvalue cannot be smaller than p. Hence, 

|(Pfi)T(AM - pI n )Pv I < \{Pv)*(A^A - pI n )Pv I = (Pv)*(A*A - pI n )Pv. 

Moreover, from W-RIP with S a < 1/2 and p G [5 a , 1/2], we also have that (Pv)*(A J A— 
pI n )Pv < \\Pv\\l Thus, 


(5.6) 


\(PvY(AiA - pI n )Pv\ < a. 


From property ( |L3| and A max (A T A) = X max (AA J ), we have that A max (A T A-pl n ) < 
1 + S — p. Finally, using the Cauchy-Schwarz inequality, we get that 


\(Qvy(A T A - pI n )Qv | < (1 + 8 - p)( 1 - a) 

| (Pv) T (A T A - pI n )Qv | < (1 + S - p)y/a(l - a). 
5.7|) and (|5.8| ) in (|5.5| ) we have that 


(5.7) 
and 

(5.8) 

Using ( |5.6| 

( 5 . 9 ) 

|^x T (A 1 A — p/ n )7V _ ^?x| < + (1 + S — p)(l — a) + 2(1 + (5 — p)xj 1 — a). 

Set x := 1 + S — p, it is easy to check that in the interval a E [0,1] the right hand side 
of (5.9) has a maximum at one of the four candidate points 


aq =0, a 2 = 1, aq,4 = -(1 ± 


(x-i ) 2 


5y 2 - 2y + 1 


1/2 


), 


where is for plus and a 4 is for minus. The corresponding function values are 

x + 1 1 3x 2 + 2 X — 1 X±1 1 (5 2_ 2 +1) i/2 

+ 2(5x 2 -2 X + 1) 1 /2’ 2 + 2 l ° X X+ ’ ’ 
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X, 1 


2 










respectively. Hence, the maximum among these four values is given for aq. Thus, 
(5.9) is upper bounded by 


(5.10) 


r Ni(A T A- pI n )N~*u\ < 


^y^ + ^(5x 2 -2x + l)i 


We now find a lower bound for u T Nu. Using the definition of D in ( |2.5| ), matrix Y 
in (2.7) is rewritten as % = (2 p 2 + \yi\ 2 )Df Vi = 1, 2, • • • , /. Thus V 2 V^(x) in (2.6) is 
rewritten as 

(5.11) 


) 3 W* 

,2/tt/ iw3ttz* 1 Jir ta3ttz*\ 


V 2 'ip fi (W*x) = j[( WD 3 W* + WD 3 W* + WYW* + WYW *) 

'j3txt'* 


+ 2p z (WD 3 W* + fUZUlU*)], 

where Di = ?/ t | 2 -D) Vi = 1,2, ••• , l. Observe, that matrix V 2 ip l ,(W* x) consists of 

two matrices WD 3 W* + WD 3 W* + WYW* + Wfw* and 2p 2 (WD 3 W* + WD 3 W*) 
which are positive semi-definite. Using ( 5.11| ) and the previous statement we get that 

u T Nu = u T (c\7 2 ipn(W*x) + pl n )u 

= ^iA{WYW* + WYW* + WYW* + Wfw*)u + p 

2 

> ^-u t (1UD 3 1U* + WD 3 W*)u + p. 


Furthermore, using the splitting of matrix D (5.3), the last inequality is equivalent to 
2 

vJ Nu = C -^~u r (W Bu D% v W i ^ + W B cE%cW£c + W Bv D\W^ + W B cD 3 Bc W^)u + p 

2 

> ^-uT(W B cD 3 Bc W^ + W B cD%cW B c)u + p. 


Using the defition of B° (5.3) in the last inequality, the quantity u T Nu is further lower 
bounded by 


(5.12) 


2 3 

u T Nu > < ^-uT(W B cW£c + W B oW B c)u + p. 


If u £ Ker(Hgc), then from ( |5.12[ ) we get 

(5.13) v7Nu > cp 2 v 3 \ min (Re(W B cW^)) + p. 

Hence, combining (5.4), (5.10) and ( 5. 13| ) we conclude that 

1 x + 1 + (5x 2 -2x + 1)5 


|A-1|< 


2 cp 2 v 3 \ min (Re(W B cW£ c )) + P 


If u E Ker(IUgc), then from (5.12) we have that u J Nu > p, hence 
| A _ 1 , < l X + l + (5x 2 -2 X + l)^ 

~ 2 p 


□ 


ll 





















Let us now draw some conclusions from Theorem |5.1| In order for the eigenvalues 
of TV -1 V 2 /^(x) to be around one, it is required that the degree of freedom v is chosen 
such that v = 0(1/p) and p is small. For such z/, the cardinality a of the set B u must 
be small enough such that matrices A and W satisfy W-RIP with constant S a < 1/2; 
otherwise the assumptions of Theorem |5.1| will not be satisfied. This is possible if 
the pdNCG iterates are close to the optimal solution x c jjLt and p is sufficiently small. 
In particular, for sufficiently small /i, from Remark A.2 we have that x C:/1 ~ x c and 


a ~ q. According to Assumption |1.2| for the ^-sparse x c , W-RIP is satisfied for 
$2 q <1/2 ==> S q < 1/2. Hence, for points close to x Cr/1 and small p we expect that 


5 a < 1/2. Therefore, the result in Theorem 5.1 captures only the limit ing behaviour 
of p reconditioned V 2 /^(x) as x x c j/x . Moreover, according to Remark < 


5.1 


4.2 


Theorem 


implies that at the limit the eigenvalues of N~ X B are also clustered around one. 

We now comment on the second result of Theorem |5.1[ when the eigenve ctor s of 
belong in Ker(Wg c ). In this case, according to Theorem 5.1 the 
preconditioner removes the disadvantageous dependence of the spectrum of V z f//(x) 
on the smoothing parameter p. However, there is no guarantee that the eigenvalues 
of A^ _1 V 2 /^(x) are clustered around one, reg ardless of the distance from the optimal 


solution x c ^. Again, because of Remark 4.2 we expect that the spectrum of TV 1 B 


at the limit will have a similar behaviour. 

The scenario of limiting behaviour of the preconditioner is pessimistic. Let a be 
the minimum sparsity level such that matrices A and W are W-RIP with S# < 1/2. 
Then, according to the uniform property of W-RIP (i.e. it holds for all at most a- 
sparse vectors), the preconditioner will start to be effective even if the iterates W*x k 
are approximately sp arse with a dominant non-zero compo nent s. Numerical evidence 


is provided in Figure 5.1 to confirm this claim. In Figure 5.1 the spectra A (B) and 
A (N~ l B) are displayed for a sequence of systems which arise when an iTV problem 
is solved. For this iTV problem we set matrix A to be a partial 2D DCT, n = 2 10 , 
m = n/4, c = 2.29e-2 and p = 5.0e-l. For the first experiment, which corresponds to 
Figures |5.1a| and [5.lc| the smoothing parameter has been set to p = 1.0e-3. For the 
second experiment, whic h co rrespo nds to Figures [5.1b| and |5.1d| we set p = 1.0e-5. 
Observe in Figures 5.1c and 5.Id that the eigenvalues of matrix N~ l B are nicely 


clustered around one. On the other hand in Figures 5.1a| and |5.1b| the eigenvalues of 
matrix B have large variations and there are many small eigenvalues close to zero. 
Notice that the preconditioner was effective not only at optimality as it was predicted 
by theory, but through all iterations of pdNCG. This is because starting from the 
zero solution the iterates W*x k were maintained approximately sparse Vfc. 

Additionally, in Figure [^2] we show the number of CG/PCG iterations and time 
required for the unpreconditioned and the preconditioned cases of the same experi¬ 
ment. Observe in Figures 5.2a| (p = 1.0e-3) and 5.2b| (p = 1.0e-5) that the number 
of PCG iterations are much less than the number of CG iterations. Surprisingly, the 
number of CG iterations required for the experiment with p = 1.0e-3 were more than 
the number of iterations for CG for the experiment with p = 1.0e-5. Although, matrix 
B has worse condition number in the latter case, see the values of the vertical axis in 
Figures [5. la| and |5.1b| We believe that this is because of the slightly better clust ering 


of the eigenvalues of matrix B for the experiments with p = 1.0e-5; see Figures |5. la 
and |5.1b| Finally, PCG was faster than CG in terms of required time for convergence, 
see Figures |5.2c| and |5.2d 


5.1. Solving systems with the preconditioner. In this subsection we discuss 
how we can solve systems with the proposed preconditioner N. 
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(c) Preconditioned (d) Preconditioned 

Fig. 5.1: Spectra of A (B), A(7V _1 5) when pdNCG is applied with smoothing parame¬ 
ter (i = 1.0e-3 (left column of sub-figures) and fi = 1.0e-5 (right column of sub-figures). 
Matrix A in B is a 2 D DCT, n = 2 10 , m = n /4 and c = 2.29e-2. Seventeen systems 
are solved in total for each experiment. 


The simplest case is when W is an orthogonal matrix. In this case it is readily 
verified that solving systems with matrix N costs two matrix vector products with 
matrices W , W T and the inversion of a diagonal matrix. Therefore, this operation is 
inexpensive. Especially, when matrix IF is a DCT or a Wavelets transform for which 
matrix-vector products with W and W J can be calculated in 0(nlogn) and G(n) 
time, respectively. 

Let us now consider iTV problems. Let p v be the vertical number of pixels of the 
image to be reconstructed and ph be the horizontal number of pixels. For simplicity 
we will assume that the image is square, hence, p = p v = Ph- Additionally, we assume 
that the image is handled in a vectorized form, i.e., instead of an image of size p x p 
we have a vectorized image of size p 2 x 1 where the columns of the image are stuck 
one after the other. In this case, for iTV the IF G C nxn matrix in problem 0) is 
square with n = p 2 , rank-deficient with rank(W ) = n — 1. Matrix IF corresponds to 
a discretization of the nabla operator and it measures local differences of pixels when 
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(a) CG/PCG iterations 


(b) CG/PCG iterations 



(c) CG/PCG time (d) CG/PCG time 


Fig. 5.2: Number of CG/PCG iterations and required time when pdNCG is applied 
with smoothing parameter /i = 1.0e-3 (left column of sub-figures) and fi — 1.0e-5 
(right column of sub-figures). Matrix A in B is a 2D DCT, n = 2 10 , m = n /4 and 
c = 2.29e-2. Seventeen systems are solved in total for each experiment. 


applied on a vectorized image. In particular, 


w = w v + W h , 

where W v G M nxn and Wh G M nxn . Matrix W v measures vertical differences of pixels 
when applied on a vectorized image and it has the following non-zero components: 

[W^ v \p(j—i)+i 1 and [W v ]p^j—i^i,p(j— i)-h+i 1 

Vj = 1 , 2, • • • and \/i = 1 , 2, • • • ,p— 1. Matrix Wh measures horizontal differences of 
pixels when applied on a vectorized image and it has the following non-zero pattern: 

i)+i,p(j— i)+i 1 and [W^h]p(j-i)-\-i,p(j— i)+i+p 1 

\/j = 1 , 2, • • • ,p — 1 and Vi = 1 , 2, • • • ,p. 
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following block tridiagonal form 


(5.14) 


N = 


this 

case 



c 2 

K j 2 

K 2 

Os 


K 3 


c P -1 k;_, 

K p -i Cp 


where C{ G M. pxp are tridiagonal matrices Vi = 1, 2, • • • ,p and G M pxp are upper 
bidiagonal matrices Vi = 1,2,... ,p — 1. Solving systems with the symmetric positive 
definite block tridiagonal matrix N can be done in 0(p 4 ) time by calculating its 
Cholesky decomposition without re-ordering. More precisely, the Cholesky factor L 
of N = LL J is of the form 


(5.15) 


L = 


Li 

Ui L 2 

u 2 l 3 
u 3 


Lp-i 

Up—i L 


PA 


where Li G M. pxp Vi = 1, 2, • • • ,p and Ui G W )Xp Vi = 1, 2, • • • ,p — 1. The factor L 


can be calculated by using N = LZJ, (5.14) and (5.15) to get 


(5.16) 

(5.17) 

(5.18) 


LiL] = Ct 

UiLj = Ki V* = 1,2,--- ,p-l 
U i -iUj_ 1 + LiLj = Ci Vi = 2, 3, • • • ,p. 


Notice that C i in (5.14) is symmetric positive definite because N is a symmetric pos¬ 


itive definite matrix. Since C\ is symmetric positive definite and tridiagonal we can 
obtain the lower bidiagonal matrix L\ in (5.16) by calculating the Cholesky decompo¬ 
sition of matrix C\. Calculation of L\ can be done in 0(p) time. Matrix U\ is upper 
diagonal and can be calculated in 0(p 3 ) time by solving U\L\ = K\. The next step 
is the calculation of L 2 using ( 5.18| ), which is the Cholesky factor of C 2 — U\U\. The 
term C 2 — U\ Uj can be calculated in 0(p 3 ). Notice that C 2 — U\U\ = C 2 — K\C^ X K\ 
is the Schur complement of a two by two block matrix 


S = 


'C! K\ 

K x C 2 


Matrix S is symmetric positive definite because matrix N is positive definite, this can 
be readily seen from (5.14). Hence, the Schur complement C 2 — K\C^[ X K\ of S is 
a symmetric positive definite matrix. By repeating this process p times using Q5.17 ) 
and (|5.18|) we obtain the matrices Li and Ui. Since each step requires 0(p 3 ) time, 


the total calculation of L requires (9(p 4 ). 

We note that this 0(p 4 ) operation is expensive for the problems of our interest. 
Therefore, instead of naively calculating the factor L we employ AMD (Approximate 
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Table 5.1: Scaling of running time of Cholesky factorization with ADM ordering, p 
denotes the number of pixels of a p x p image. 


p 

16 

32 

64 

128 

256 

512 

1024 

2048 

CPU (sec) 

3.6e-4 

1.4e-3 

6.1e-3 

3.0e-2 

1.6e-l 

9.1e-l 

5.2 

30.6 


Minimum Degree) ordering pQ by invoking MATLAB’s backslash operator. AMD 
results in significant reduction in the F LOPS (FLoating-point Operations Per Second) 
rate and the filling in L. In Table 


5.1 


we present how the running time of MATLAB’s 
backslash operator scales as p increases from to 2 3 to 2 11 . The results are averaged 
over 20 trials. Observe that the running time scales nearly 0(p 2 ), which is a significant 
improvement compared to 0(p 4 ). The matrices N in this experiment were constructed 
using pdNCG. Additionally, for this experiment we run MATLAB only in one thread 
in order to eliminate the effects of multithread implementations. 

Unfortunately, solving systems with the proposed preconditioner is not always an 
inexpensive procedure. In particular, solving systems with matrix N when W has 
orthonormal rows is a non-trivial operation. An example is radar and sonar systems 
ED, where W E M. nxl is a Gabor frame with n < /. In this case, N does not have a 
structure which can be exploited in order to solve systems with it inexpensively. In 
similar cases in the literature, i.e., denoising of images ED, attempts have been made 
to solve systems with the preconditioner using an iterative method. In our case N 
is a symmetric positive definite matrix. Therefore, CG^ can be used, where CG ^ 
denotes conjugate gradients method, but we add a subscript N in order to distinguish 


from the unpreconditioned CG notation for the system in (4.4). 


Our personal numerical experience regarding this strategy suggests that it is dif¬ 
ficult to control the time required by CG^ to solve systems with N approximately, 
such that the overall time required by PCG is reduced. Although, the number of PCG 
iterations is decreased, even with a small number of CG^ iterations. In Figure 5.3 


we present an example where using the preconditioner N in an iterative fashion does 
not decrease the time required by pdNCG. The tested problem is a radar tone recon¬ 
struction instance [5], where W Gl nxl is a Gabor frame with n = 2 13 , l = 228864, 
p = 1.0e-5, c = 2.41e-5 and p = 5.0e-l. Matrix A is a block-diagonal with ±1 entries 
in the blocks and m = 648 rows. We make four experiments. For the first three 
experiments we vary the number of CG^ iterations for solving systems with N. The 
number of CG^ iterations is set to 5, 10 and 20, respectively for each experiment. 
The last experiment is using unpreconditioned CG. Observe in Figure |5l3a| that PCG 
requires significantly fewer iterations than unpreconditioned CG for all settings of 


CGfi. However, notice in Figure 5.3b the time required by PCG is larger than the 
time required by CG. 

6. Continuation. In the previous section we have shown that by using precon¬ 
ditioning, the spectral properties of systems which arise can be improved. However, 
for initial stages of pdNCG a similar result can be achieved without the cost of having 
to apply preconditioning. In particular, at initial stages the spectrum of B can be 
controlled to some extent through inexpensive continuation. Whilst preconditioning 
is enabled only at later stages of the process. Briefly by continuation it is meant that 
a sequence of “easier” subproblems is solved, instead of solving directly problem (2.2). 


The reader is referred to Chapter 11 in [25] for a survey on continuation methods in 
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(a) CG/PCG iterations 


(b) CG/PCG time 


Fig. 5.3: Number of CG/PCG iterations and required time when (i = 1.0e-5 and 
systems with the preconditioner N are solved approximately using conjugate gradi¬ 
ents. PCG 5 , PCG 10 and PCG 20 correspond to PCG, where systems are solved with 
N approximately using conjugate gradients which is terminated after 5, 10 and 20 
iterations, respectively. 


l: Outer loop: For j = 0,1, 2,..., $, produce (c ?, 

2: Inner loop: Approximately solve the subproblem 

minimize ( x ) 

using pdNCG and by initializing it with the solution 
of the previous subproblem. 

Fig. 6.1: Continuation framework 


optimization. 

In this paper we use a similar continuation framework to gi mamma. In partic¬ 
ular, a sequence of sub-problems ( 2 . 2 ) is solved, where each of them is parameterized 
by c and /i simultaneously. Let c and p be the final parameters for which problem 
(2.2) must be solved. Then the number of continuation iterations il is set to be the 
maximum order of magnitude between 1/c and 1 /p. For instance, if c = 1.0e-2 and 
p = 1.0e-5 then il := max( 2 , 5) = 5. If il > 2 , then the initial parameters c° and /i° 
are both always set to 1 . 0 e-l and the intervals [c°, c] and [/i°, p] are divided in il equal 
subintervals in logarithmic scale. For all experiments that we have reported in this 
paper we have found that this setting leads to a generally acceptable improvement 
over pdNCG without continuation. The pseudo-code of the proposed continuation 
framework is shown in Figure [ 6 T| 

Figure [ 6 ^ 2 ] shows the performance of pdNCG for four cases, no continuation and 
no preconditioning, no continuation with preconditioning, continuation with precondi¬ 
tioning through the whole process and continuation with preconditioning only at later 

shows the relative error \\x k — II2/||2- 


stages. The vertical axis of Figure 


6.2 
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Fig. 6.2: Performance of pdNCG for four different settings, i) no continuation and 
no preconditioning, ii) no continuation with preconditioning, iii) continuation with 
preconditioning through all iterations and iv) continuation with preconditioning only 
at later stages. The vertical axis presents the relative error \\x k — ^c,/2II2/||2 5 
where x^p is the optimal solution for the parameter setting c, fl in problem (2.2). 
The horizontal axis is in log-scale. 


The optimal x^p is obtained by using pdNCG with parameter tuning set to recover 
a highly accurate solution. The horizontal axis shows the CPU time. The problem is 
an iTV problem where matrix A is a partial 2D DCT, n = 2 16 , m = n/4, c = 5.39e-2 
and p = 5.0e-l. The final smoothing parameter fl is set to 1.0e-5. For the experiment 
that preconditioning is used only at later stages of continuation; preconditioning is 
enabled when p? < 1.0e-4, where j is the counter for continuation iterations. All 
experiments are terminated when the relative error \\x k — ||2 /|| 2 < 1 . 0 e-l. 

Solving approximately the problem is an acceptable practice since the problem is 
very noisy (i.e. signal-to-noise-ratio is 10 decibel) and there is not much improvement 
of the reconstructed image if more accurate solutions are requested. Finally, all other 
parameters of pdNCG were set to the same values for all four experiments. Observe 
in Figure [6^2] that continuation with preconditioning only at late stages was the best 
approach for this problem. 

7. Numerical experiments. In this section we demonstrate the efficiency of 
pdNCG against state-of-the-art methods for CS. We briefly discuss existing methods, 
describe the setting of the experiments and finally present numerical results. All 
experiments that are demonstrated in this paper can be reproduced by downloading 
the software from http: //www. maths . ed. ac . uk/ERGO/pdNCG/, 

7.1. Existing algorithms. We compare pdNCG with three state-of-the-art 
first-order methods, TFOCS [3], TVAL3 [IT and TwIST JJ. 

- TFOCS (Templates for First-Order Conic Solvers) is a MATLAB software 
for the solution of signal reconstruction problems. TFOCS solves the dual 
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problem of 


(7.1) 


minimize c||VF*x||i + ^r~\\x — x 


■c*\\l+ l -\\Ax-b\\ 


where /iTi is a positive constant. TFOCS also solves the dual problem of 


(7.2) 


mm 


Mt 2 

2 


subject to: 


\\W*x\\ 

|| Ax - b\\ 2 < e 


x — X 


where e > 0. Although problems 0 and \7.2\ are non-smooth, the regular¬ 
ization terms /iT x /2||x — rc° ||| and / xt 2 /2 \\x — ||| yield smooth convex dual 

problems, which can be solved by standard first-order methods. In particu¬ 
lar, the smooth dual problems are solved using the Auslender and Teboulle’s 
accelerated first-order method [2|. In our experiments we present results for 
TFOCS for both problems ( |7.1| ) and (7.2). We denote by TFOCS.unc the 
version that solves the unconstrained problem ( |7.1| ) and by TFOCS_con the 
version that solves the constrained problem ( |7.2| ). TFOCS can be downloaded 
from [5]. 

TVAL3 (Total-Variation minimization by Augmented Lagrangian and Alter¬ 
nating direction ALgorithms) is a MATLAB software for the solution of sig¬ 
nal reconstruction problems regularized with the total-variation semi-norm. 
TVAL3 reformulates problem 0 to the equivalent problem 


(7.3) 


minimize 


l 

E 

i=1 


\\n]x\\ 2 + ±- c \\Ax-b\\l 


where = [ReWi, ImWj\ G M nx2 . Then it solves the augmented Lagrangian 


reformulation of problem (7.3), which is 


(7.4) 


minimize 


B 






Ui )) + — \\Ax- 


III, 


where iq, v>i G M 2 and /3, c are positive constants. The augmented Lagrangian 
in (7.4) is minimized for variables x G M n and iq i = 1, 2, • • • , 1. The param¬ 


eters Vi i = 1, 2, • • • , l are handled by the method. 

- TwIST (Two-step Iterative Soft Thresholding): is also a MATLAB software 
for signal/image processing problems. TwIST solves problem (1.1). TwIST is 
a nonlinear two-step iterative version of 1ST, which according to its authors 
is more effective on ill-conditioned and ill-posed problems. 

Another solver is NestA [4] (by the same authors as TFOCS) which can also solve 
(1.1) but it is applicable only in the case that (AA T ) -1 is available. Additionally, 


TFOCS is the sucessor of NestA, since both apply the similar techniques but TFOCS 
is a newer and allows of more control options. Another method is the Primal-Dual 
Hybrid Gradient (PDHG) of [14]. PDHG has been reported to be very efficient 
for imaging applications such as denoising and deblurring, for which matrix A is 
the identity or a square and full-rank matrix which is inexpensively diagonalizable. 
Unfortunately, this is not the case for the CS problems that we are interested in. 
However, for all earlier mentioned methods the matrix inversion can be replaced with 
a solution of a linear system at every iteration of the methods or a one-time cost of a 
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factorization. To the best of our knowledge, there are no available implementations 
with such modifications for these methods. 

There exists also a generic proximal algorithm for total-variation [12] and the 
Generalized Iterative Soft Thresholding (GISTA) in [20] for which we do not have 
generic implementations for CS problems. 


7.2. Equivalent problems. Solvers pdNCG, TFOCS_unc, TVAL3 and TwIST 
solve the penalized least squares problem <H3 , while TFOCS_con solves the con¬ 
strained least squares problem (7.2). 

In our experiments we put significant effort in calibrating the parameters c and 
e for the penalized and constrained least-squares problems, respectively, such that all 
methods solve similar problems. First, we set e = ||fe — b ||2 h 1 (7.2), where b is the 
noiseless sampled signal. Hence problem (7.2) is parameterized with the optimal e. 
Then we find an approximation of the optimal c. By optimal c we mean the value of 
c for which problems < 0 > and ( |7.2| ) are equivalent if e = j |6 — b \\2 and pr 2 = 0- Let 
uo denote the optimal Lagrange multiplier of (7.2). If e = \\b — b ||2 and pt 2 = 0, then 
it is easy to show that for c := 2/uo problems (1.1) and ( |7.2| ) are equivalent. 

The exact optimal Lagrange multiplier uj is not known a-priori. However it can 
be calculated by solving to high accuracy the dual problem of (7.2) with TFOCS by 
setting fiT 2 ~ 0 in (7.2). Then we set c := 2/u. If b is not available, then e is set such 
that a visually pleasant solution is obtained. 


7.3. Parameter tuning and hardware. The parameter p of pdNCG is set 
to fi = 1.00e-5, which for the problems of our interest resulted in solutions with the 
similar or better accuracy than the compared methods. The parameter 77 in (4.6) is set 
to 1.0e-l, the maximum number of backtracking line-search iterations is fixed to 10. 
Moreover, the backtracking line-search parameters t\ and 72 in step 5 of pdNCG (Fig. 


4.1) are set to 9.0e-l and 1.0e-3, respectively. The constant p of the preconditioner 


in ( |5.2| ) is set to 5.0e-l. 

For TVAL3 parameter f3 is set to p = 2 8 based on suggestions of its authors in m 
and our personal experience. Moreover, continuation was enabled in order to enhance 
the performance of the method. Any other parameters that were not discussed are 
set to their default values. 

We tune TwIST based on comments/suggestions of its authors and personal ex¬ 
perience. In particular, parameter A is set to A = 0.04 and the maximum number of 
iterations for the iTV denoising procedure is set to 10. 

The version 1.3.1 of TFOCS has been used. The termination criterion of TFOCS 
is by default the relative step-length. The tolerance for this criterion is set to the 
default value, except in cases that certain suggestions are made in TFOCS software 
package or the corresponding paper [5,. The default Auslender and Teboulle’s single¬ 
projection method is used as a solver for TFOCS. Moreover, as suggested by the 
authors of TFOCS, appropriate scaling is performed on matrices A and IF, such that 
they have approximately the same Euclidean norms. All other parameters are set to 
their default values, except in cases that specific suggestions are made by the authors. 
Generally, regarding tuning of TFOCS, substantial effort has been made to guarantee 
that problems are not over-solved. 

All solvers are MATLAB implementations and all experiments are performed on 
a MacBook Air running OS X 10.10.1 with 2 GHz (3 GHz turbo boost) Intel Core 
Duo i7 processor using MATLAB R2012a. The cores were working with frequency 
2.7-3 GHz during the experiments and we did not observe any CPU throttling. 


20 


















7.4. Termination criteria. For images we measure the quality of the recon¬ 
structed solutions by using the Peak-Signal-to-Noise-Ratio (PSNR) function 

PSNR=101»g„(!^), 

where peakval is the range of the image datatype, in this case the range is one since 
we work with black and white images, and MSE is the mean squared error between 
the solution and the original noiseless image. For other types of signals we measure 
the quality of the reconstructed solutions by measuring their Signal-to-Noise-Ratio 
(SNR). 

We terminate pdNCG, TVAL3 and TwIST when the PSNR (for images) or the 
SNR (for other types of signals) of their solution is equal or larger than the PSNR 
or SNR of the solution obtained by TFOCS_unc. This way we make sure that all 
methods which solve the penalized least-squares problem terminate when a solution 
of the same quality as the one of TFOCS.unc is obtained. As we mentioned in 
Subsection [A3] when we use TFOCS_unc we do not over-solve the problem, otherwise, 
we would favour pdNCG, which is a second-order method. Since pdNCG, TVAL3, 
TwIST and TFOCS_unc solve the same problem with the same penalty parameter 
c then we can make a fair comparison of their performance. The solution obtained 
by TFOCS.con might differ slightly in terms of PSNR or SNR compared to the ones 
obtained by pdNCG, TVAL3, TwIST and TFOCS_unc. However, this is because 
we set parameter c to be approximately close to the optimal value that makes the 
penalized and the constrained problems equivalent. 


7.5. Problems sets. We compare the solvers pdNCG, TFOCS, TVAL3 and 
TwIST on image reconstruction problems which are modelled using iTV. We separate 
the images to be reconstructed into two sets, which are shown in Figures 7.1 and 


7.2| Figure [7T] includes some standard images from the image processing community. 
There are seven images in total, the house and the peppers, which have 256 x 256 
pixels and Lena, the fingerprint, the boat and Barbara, which have 512 x 512 pixels. 
Finally, the image Shepp-Logan has variable size depending on the experiment. Figure 
7.2 includes images which have been sampled using a single-pixel camera m- Briefly 
a single-pixel camera samples random linear projections of pixels of an image, instead 
of directly sampling pixels. The problem set can be downloaded from http://dsp. 
rice.edu/cscamera. In this set there are in total five sampled images, the dice, the 
ball, the mug, the letter R and the logo. Each image has 64 x 64 pixels. 

Moreover, we present the performance of pdNCG and TFOCS on the recovery 
of radio-frequency radar tones. This problem has been first demonstrated in Subsec¬ 
tion 6.5 of [5]. We describe again the setting of the experiment. The signal to be 
reconstructed consists of two radio-frequency radar tones which overlap in time. The 
amplitude of the tones differs by 60 dB. The carrier frequencies and phases are chosen 
uniformly at random. Moreover, noise is added such that the larger tone has SNR 
60 dB and the smaller tone has SNR 2.1e-2 dB. The signal is sampled at 2 15 points, 
which corresponds to Nyquist sampling rate for bandwidth 2.5 GHz and time period 
approximately 6.5e+3 ns. The reconstruction is modelled as a CS problem where 
matrix A E M mxn j g block-diagonal with =bl for entries, n = 2 15 and m = 2616, 
i.e. subsampling ratio m/n ~ 7.9e-3. Moreover, W E M nxZ is a Gabor frame with 
l = 915456. 

7.6. Dependence of pdNCG on smoothing parameter. In this subsection 
we present the performance of pdNCG with and without preconditioning for decreas- 
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(g) Shepp-Logan: 
variable size 


Fig. 7.1: Benchmark images, the number of pixels for each image is given in the 
sub-captions . For Figure 17. lgl the size varies depending on the experiment 


ing values of the smoothing parameter fi. For this experiment we use the images from 
Figures |7.1a| to |7.1f| The CS matrix for all experiments is a partial Discrete Cosine 
Transform (DCT) matrix with m ^ n /4 and n is equal to the number of pixels of 
each image in Figure |7T| For all experiments the sampled signals have PSNR equal 
to 20 decibels (dB). 


The results of the experiments are shown in Table 7.1 In Table 7.1 notice that for 


the preconditioned case there is always a large increase in CPU time from fi = 1.0e-02 
to 1.0e-04. This is because for fi — 1.0e-02 pdNCG relies only on continuation, while 
for values of /i equal or smaller than 1.0e-04 preconditioning is necessary and it is 
automatically activated using the technique described in Section [6j Overall pdNCG 
had a stable performance with respect to the smoothing parameter fi. This is due 
to the good performance of the proposed preconditioner. Notice that without the 
preconditioner the performance of pdNCG for /i < 1.0e-04 worsens noticeably. In 
particular, for some experiments the unpreconditioned pdNCG required more than 3 
hours of CPU time. For these experiments we forced termination of the method and 
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(d) Letter (e) Logo 



Fig. 7.2: Benchmark images, which were sampled using the single-pixel camera m 


we do not report any results. 


7.7. Dependence on problem size. We now present the performance of meth¬ 
ods pdNCG, TFOCS, TVAL3 and TwIST as the size of the problem n increases. The 
image from Figure |7.1g| has been used for this experiment. Again, the CS matrix for 
all experiments is a partial Discrete Cosine Transform (DCT) matrix with m ~ n/ 4. 
The sampled signals have PSNR equal to 20 dB. 

The results of this experiment are shown in Table |7.2| Observe that all methods 
exhibit a linear-like increase in CPU time as a function of the size of the problem. We 
denote with bold the problem for which pdNCG was the fastest method. 

In this table we present the PSNR of the recovered solutions for each solver. 
Notice that TVAL3 does not converge always to a solution of similar PSNR as the 
other solvers. Although, we put significant effort to tune its parameters. A similar 
performance of TVAL3 is observed for many of the experiments in the subsequent 
subsections. Moreover, observe that TwIST was much slower than the other methods 
and it did not converge to a solution of similar PSNR to TFOCS or pdNCG for all 
experiments. Similar performance for TwIST has been observed in m- 


7.8. Dependence on the level of noise. In this experiment we compare the 
solvers pdNCG, TFOCS and TVAL3 as the level of noise increases. We exclude 
TwIST from this and subsequent experiments due to its poor performance on the 
simple synthetic experiment reported in Subsection |7.7[ similar performance has been 
observed in [T9] . For this experiment we use the images from Figures [~7.1a| to 7.If The 
CS matrix for all experiments is a partial Discrete Cosine Transform (DCT) matrix 
with m n/ 4. 

In Table |7.3| we present the performance of the methods for this experiments. 
In the second column of Table |7.3| the PSNR is shown, which is decreasing from 95 
dB to 20 dB in six steps. The rest of the table shows the CPU time, which was 
required by each solver. Overall pdNCG has good performance for problems with 
large level of noise, i.e., PSNR equal to 20 dB. We denote with bold the problems 
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Table 7.1: Performance of pdNCG for decreasing values of the smoothing parameter 
fi. For this experiment the images from Figures |7.1a| to |7.1f| have been used. The 
table shows the CPU time in seconds required for preconditioned pdNCG and unpre¬ 
conditioned pdNCG for each combination of fi and problem. PSNR corresponds to 
the reconstructed solution of pdNCG. 



CG/PCG 

House 

Peppers 

Lena 

Fingerprint 

Boat 

Barbara 


PCG 

4 

4 

22 

19 

23 

19 

1 .0e-02 

CG 

4 

4 

22 

19 

22 

19 


PSNR 

19.2 

24.4 

25.6 

18.1 

24 

22.6 


PCG 

8 

9 

40 

136 

50 

42 

1.0e-04 

CG 

10 

11 

85 

155 

85 

57 


PSNR 

19.2 

24.4 

25.6 

18.1 

24 

22.6 


PCG 

10 

10 

56 

203 

57 

85 

1.0e-07 

CG 

82 

98 

1165 

2839 

1519 

852 


PSNR 

19.3 

24.4 

25.6 

18.1 

24 

22.6 


PCG 

11 

12 

73 

182 

76 

86 

1 .0e-10 

CG 

273 

195 

3484 

- 

3208 

3508 


PSNR 

19.3 

24.4 

25.6 

18.1 

24 

22.6 


PCG 

11 

12 

66 

232 

68 

85 

1.0e-13 

CG 

265 

242 

4834 

- 

5356 

- 


PSNR 

19.3 

24.4 

25.6 

18.1 

24 

22.6 


Table 7.2: Performance of pdNCG, TFOCS, TVAL3 and TwIST for increasing prob¬ 
lem size. The image Shepp-Logan from Figure |7Tg has been used for this experiment. 
The table shows the required CPU time and the PSNR of the recovered solutions for 
each solver. 


Solver 

n= 

64 2 

128 2 

256 2 

512 2 

1024 2 

TFOCS_con 

CPU time (sec) 

16 

23 

55 

264 

1034 


PSNR 

15.8 

17.4 

17.9 

18 

18 

TFOCS_unc 

CPU time (sec) 

19 

30 

79 

385 

1477 


PSNR 

15.8 

17.4 

17.9 

18 

18 

TVAL3 

CPU time (sec) 

1 

2 

250 

1365 

4843 

PSNR 

15.8 

17.4 

17.2 

17.2 

17.3 

TwIST 

CPU time (sec) 

28 

135 

259 

2149 

7223 

PSNR 

13.5 

15.9 

16.8 

16.9 

16.9 

pdNCG 

CPU time (sec) 

2 

6 

13 

62 

237 

PSNR 

15.8 

17.4 

17.9 

18 

18 


for which pdNCG was the fastest solver. In this table we use the star superscript to 
denote solvers, which solve the unconstrained problem (1.1) but do not converge to a 
solution of equal or larger PSNR than the solutions of TFOCS_unc. 

In Table [7~4] we show the PSNR for the solutions calculated by TFOCS_con for 
the corresponding experiments in Table 7.3 For experiments that are not denoted 
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Table 7.3: Performance of pdNCG, TFOCS and TVAL3 for increasing level of noise 
(decreasing PSNR). PSNR is measured in dB. For this experiment the images from 
Figures |7.1a| to |7Tf| have been used. The table shows the CPU time in seconds required 
by each solver. By the star superscript we denote methods that failed to converge to 
a similar solution as the other solvers. 


Solver 

PSNR 

House 

Peppers 

Lena 

Fingerprint 

Boat 

Barbara 


95 

55 

55 

262 

261 

263 

263 


80 

55 

55 

261 

263 

265 

262 

TFOCS_con 

65 

55 

55 

262 

264 

265 

262 

50 

55 

55 

262 

262 

262 

262 


35 

55 

55 

261 

262 

262 

262 


20 

54 

55 

263 

262 

262 

261 


95 

76 

76 

382 

381 

390 

382 


80 

76 

76 

383 

385 

388 

381 

TFOCS_unc 

65 

76 

76 

384 

399 

386 

381 

50 

75 

76 

383 

383 

382 

381 


35 

76 

76 

382 

382 

383 

382 


20 

76 

76 

382 

379 

380 

379 


95 

3 

3 

16 

10 

38 

55 


80 

3 

3 

16 

10 

38 

55 

TVAL3 

65 

3 

4 

10 

10 

37 

56 

50 

3 

3 

383 

10 

24 

44 


35 

3 

3 

1104* 

8 

36 

1067* 


20 

184* 

185* 

999* 

8 

10 

987* 


95 

20 

82 

145 

141 

182 

209 


80 

20 

82 

146 

142 

182 

208 

pdNCG 

65 

20 

243 

145 

142 

178 

209 

50 

20 

55 

169 

142 

116 

209 


35 

13 

38 

109 

138 

111 

202 


20 

16 

16 

101 

156 

103 

105 


with a star superscript in Table [U3| the solvers pdNCG, TFOCS_unc and TVAL3 
obtained solutions of similar PSNR due to our setting described in Subsection |7.2| 


7.9. Dependence on number of measurements. In this experiment we com¬ 
pare the three methods for a decreasing number of measurements m. For this experi¬ 
ment we use the images from Figures [7. la| to 7.If The CS matrix is a partial Discrete 
Cosine Transform (DCT) matrix. For all experiments the sampled signals have PSNR 
equal to 20 dB. 

We denote with bold 


The results of this experiment are shown in Table 7.5 


the problems for which pdNCG was the fastest method. In Table [776] we show the 
PSNR for the solutions calculated by TFOCS_con for the corresponding experiments 
in Table [73] For experiments that are not denoted with a star superscript in Table 
7.5 the solvers pdNCG, TFOCS_unc and TVAL3 obtained solutions of similar PSNR 


due to our setting described in Subsection 7.2 
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Table 7.4: This table shows the PSNR for the solutions calculated by TFOCS.con 
for the corresponding experiments in Table 7.3 For this experiment the images from 
Figures 7.1a to 7. If have been used. 


Solver 

PSNR 

House 

Peppers 

Lena 

Fingerprint 

Boat 

Barbara 


95 

20 

31.9 

29.5 

20.1 

27.6 

25 


80 

20 

31.9 

29.5 

20.1 

27.6 

25 

TFOCS.con 

65 

20 

31.8 

29.5 

20.1 

27.6 

25 

50 

20 

31.6 

29.5 

20.1 

27.5 

24.9 


35 

19.9 

29.7 

28.6 

19.7 

26.9 

24.7 


20 

19.2 

24.3 

25.6 

17.9 

24 

22.6 


Table 7.5: Performance of pdNCG, TFOCS and TVAL3 for decreasing number of 
measurements m. In the second column the percentage of measurements is shown, 
for example 75% means that m ~ 3n/4, where n is the number of pixels in the image 
to be reconstructed. For this experiment the images from Figures [7.1a| to 7.If have 
been used. The table shows the CPU time in seconds required by each solver. By 
the star superscript we denote methods that failed to converge to a similar solution 
as the other solvers. 


Solver 

m 

House 

Peppers 

Lena 

Fingerprint 

Boat 

Barbara 


75% 

58 

58 

273 

272 

272 

273 

TFOCS.con 

50% 

56 

56 

268 

266 

268 

270 


25% 

54 

55 

263 

261 

265 

263 


75% 

78 

78 

392 

388 

393 

396 

TFOCS.unc 

50% 

77 

77 

389 

383 

387 

389 


25% 

76 

76 

382 

378 

386 

380 


75% 

296* 

301* 

1584* 

1587* 

1497* 

1250* 

TVAL3 

50% 

262* 

253* 

8 

7 

7 

10 


25% 

184* 

184* 

998* 

8 

10 

986* 


75% 

23 

23 

132 

134 

136 

140 

pdNCG 

50% 

21 

21 

131 

162 

135 

137 


25% 

16 

16 

101 

156 

104 

105 


7.10. Single-pixel camera. We now compare TFOCS with pdNCG on realistic 
image reconstruction problems where the data have been sampled using a single¬ 
pixel camera m- In this experiment we compare our solver only with TFOCS_con. 
This is because in all previous experiments TFOCS.con was faster than TFOCS.unc. 
Additionally, we were not able to make TVAL3 to converge to a solution which was 
as visually pleasant as the solutions obtained by TFOCS_con and pdNCG. We believe 
that this is due to the different CS matrix A in these experiments. In particular, 
matrix A E M mxn , where n = 64 2 and m ~ 0.4n, is a partial Walsh basis which takes 
values 0/1 instead of ±1. We noticed that this matrix A does not satisfy the RIP 
property in Definition l.l| with small S q . Therefore, the least squares term in problem 
© might be ill-conditioned and this causes difficulties for TVAL3. 

Moreover the optimal solutions are unknown and additionally the level of noise is 
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Table 7.6: This table shows the PSNR for the solutions calculated by TFOCS.con 
for the corresponding experiments in Table 7.5 For this experiment the images from 
Figures 7.1a to 7. If have been used. 


Solver 

m 

House 

Peppers 

Lena 

Fingerprint 

Boat 

Barbara 


75% 

19.6 

27.9 

27.2 

19.9 

25.9 

24.8 

TFOCS.con 

50% 

19.5 

26.6 

26.7 

19.2 

25.2 

23.8 


25% 

19.2 

24.3 

25.6 

17.9 

24 

22.6 


unknown. Hence the reconstructed images can only be compared by visual inspection. 
For all four experiments 40% of measurements are selected uniformly at random. 

The reconstructed images by the solvers TFOCS.con and pdNCG are presented 
in Figure [73] Solver pdNCG was faster on four out of five problems. On problems 
that pdNCG was faster it required on average 1.5 times less CPU time. Although it 
would be possible to tune pdNCG such that it is faster on all problems, we preferred 
to use its (simple) default tuning in order to avoid a biased comparison. 
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(a) TFOCS_con, 25 sec. (b) pdNCG, 7 sec. (c) TFOCS.con, 24 sec. 




(d) pdNCG, 15 sec. (e) TFOCS_con, 37 sec. (f) pdNCG, 15 sec. 


(g) TFOCS_con, 26 sec. (h) pdNCG, 27 sec. (i) TFOCS, 49 sec. 







(j) pdNCG, 33 sec. 


Fig. 7.3: Experiment on realistic image reconstruction where the samples are acquired 
using a single-pixel camera. The subcaptions of the figures show the required seconds 
of CPU time for the image to be reconstructed for each solver. 


7.11. Radar tone reconstruction. In this subsection we present the perfor¬ 
mance of TFOCS_unc and pdNCG for the radar tone reconstruction problem, which 
was described in Subsection |7.5| We exclude TVAL3 from this experiment because it 
is implemented to solve only TV problems. We also exclude TFOCS.unc since it is 
superseded in all previous experiments by TFOCS_con. 


The results of the comparison are presented in Figure 


7.4 


Observe in Figures 
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7.4a| and |7.4b| that both solvers recovered a solution of similar accuracy but pdNCG 


was slightly faster. The solution of TFOCS.con has SNR 62.3 dB and the solution of 
pdNCG has SNR 64.7 dB. It is important to mention that the problems were not over¬ 
solved. TFOCS_con was tuned as suggested by its authors in a similar experiment 
which is shown in Subsection 6.5 of [5]. 

In Figure |7~Ic| we plot the SNR against CPU time for every iteration of pdNCG 
and TFOCS.con. Observe that nearly for all iterations pdNCG the approximate 
solutions of pdNCG had larger SNR than the approximate solutions of TFOCS_con. 

For this experiment we enabled preconditioning for pdNCG. Although, in Subsec¬ 
tion [5T] we mentioned that preconditioning might affect adversely the performance of 
pdNCG in terms of CPU time. However, we noticed that by enabling preconditioning 
pdNCG was more stable. Therefore, we believe that it is worth paying the cost of a 
slight increase of a CPU time to improve the overall robustness of pdNCG. 


8. Conclusions. Recently there has been great interest in the development of 
optimization methods for the solution of compressed sensing problems. The methods 
that have been developed so far are mainly first-order methods. This is because first- 
order methods have inexpensive iterations and frequently offer fast initial progress 
in the optimization process. On the contrary, second-order methods are considered 
to be rather expensive. The reason is that often access to second-order information 
requires the solution of linear systems. In this paper we develop a second-order 
method, a primal-dual Newton Preconditioned Conjugate Gradients. We show that an 
approximate solution of linear systems which arise is sufficient to speed up an iterative 
method and additionally make it more robust. Moreover, we show that for compressed 
sensing problems an inexpensive preconditioner can be designed that speeds up even 
further the approximate solution of linear systems. Extensive numerical experiments 
are presented which support our findings. Spectral analysis of the preconditioner is 
performed and shows its very good limiting behaviour. 


Acknowledgement. We are grateful to the anonymous reviewers who made 
crucial recommendations which improved the clarity of contribution and the quality 
of this paper. 


REFERENCES 

[1] P. R. Amestoy. Algorithm 837: Amd, an approximate minimum degree ordering algorithm. 

ACM Transactions on Mathematical Software (TOMS), 30(3):381-388, 2004. 

[2] A. Auslender and Teboulle. Interior gradient and proximal methods for convex and conic 

optimization. SIAM Journal on Optimization, 16(3):697-725, 2006. 

[3] A. Beck and M. Teboulle. Smoothing and first order methods: A unified framework. SIAM J. 

Optim., 22(2):557-580, 2012. 

[4] S. R. Becker, J. Bobin, and E. J. Candes. NestA: A fast and accurate first-order method for 

sparse recovery. SIAM J. Imaging Sciences, 4(1): 1-39, 2011. 

[5] S. R. Becker, E. J. Candes, and M. C. Grant. Templates for convex cone problems with 

applications to sparse signal recovery. Mathematical Programming Computation, 3(3):165- 
218, 2011. http://cvxr.com/tfocs/ 

[6] J. M. Bioucas-Dias and M. A. T. Figueiredo. A new TwIST: Two-step iterative shrink¬ 

age/thresholding algorithms for image restoration. IEEE Transactions on Image Pro¬ 
cessing, 16(12):2992-3004, 2007. 

[7] E. J. Candes. The restricted isometry property and its implications for compressed sensing. C. 

R. Acad. Sci. Paris, 346:589-592, 2008. 

[8] E. J. Candes and D. L. Donoho. New tight frames of curvelets and optimal representations of 

objects with piecewise c 2 singularities. Comm. Pure Appl. Math., 57:219-266, 2004. 

[9] E. J. Candes, Y. C. Eldar, and D. Needed. Compressed sensing with coherent and redundant 

dictionaries. Applied and Computational Harmonic Analysis, 31(l):59-73, 2011. 


29 











Frequency (GHz) 


Frequency (GHz) 


(a) pdNCG, SNR 64.7, 525 sec 


(b) TFOCS_con, SNR 62.3, 556 sec 



(c) SNR against CPU time 


Fig. 7.4: Reconstruction of two radio-frequency radar tones by TFOCS_con and 
pdNCG. In Figure |7.4a| the reconstructed signal by pdNCG is shown. The signal 
has SNR 64.7 dB and it required 525 seconds of CPU time to be reconstructed. In 
Figure 7.4b the reconstructed signal by TFOCS_con is shown. The signal has PSNR 


62.3 dB and it required 556 seconds of CPU time to be reconstructed. In Figure 7.4c 


we present the SNR for each iteration of pdNCG and TFOCS_con against CPU time. 


[10] R. H. Chan, T. F. Chan, and H. M. Zhou. Advanced signal processing algorithms, in Proceedings 

of the International Society of Photo-Optical Instrumentation Engineers, F. T. Luk, ed., 
SPIE , pages 314-325, 1995. 

[11] T. F. Chan, G. H. Golub, and P. Mulet. A nonlinear primal-dual method for total variation- 

based image restoration. SIAM J. Sci. Comput ., 20(6): 1964-1977, 1999. 

[12] L. Condat. A generic proximal algorithm for convex optimization - Application to total- 

variation. IEEE Signal Processing Letters , 21(8):985-989, 2014. 

[13] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, S. Ting, K. F. Kelly, and R. G. Bara- 

niuk. Single-pixel imaging via compressive sampling. IEEE Signal Processing Magazine , 
25(2):83-91, 2008. http://dsp.rice.edu/cscamera 

[14] J. E. Esser. Primal Dual Algorithms for Convex Models and Applications to Image Restoration, 

Registration and Nonlocal Inpainting. PhD thesis, University of California, 2010. 

[15] K. Fountoulakis and J. Gondzio. A second-order method for strongly convex I \-regularization 

problems. Mathematical Programming (accepted), 2015. DOI: 10.1007/sl0107-015-0875-4. 


30 


























[16] K. Fountoulakis, J. Gondzio, and P. Zhlobich. Matrix-free interior point method for compressed 

sensing problems. Mathematical Programming Computation , 6(1): 1-31, 2014. 

[17] E. T. Hale, W. Yin, and Y. Zhang. Fixed-point continuation for t\ -minimization: Methodology 

and convergence. SIAM J. Optim ., 19(3): 1107-1130, 2008. 

[18] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge 

University Press, ISBN: 0521540518, second edition, 2004. 

[19] C. Li, W. Yin, H. Jiang, and Y. Zhang. An efficient augmented Lagrangian method with ap¬ 

plications to total variation minimization. Computational Optimization and Applications , 
56(3):507-530, 2013. 

[20] I. Loris and C. Verhoeven. On a generalization of the iterative soft-thresholding algorithm for 

the case of non-separable penalty. Inverse Problems , 27(12):1-15, 2011. 

[21] S. Mallat. A wavelet tour of signal processing, second ed. Academic Press, London , 1999. 

[22] J. J. Moreau. Proximite et dualite dans un espace Hilbertien. Bull. Soc. Math. France , 93:273- 

299, 1965. 

[23] D. Needell and R. Ward. Stable image reconstruction using total variation minimization. SIAM 

J. Imaging Sciences , 6(2): 1035-1058, 2013. 

[24] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program ., 103(1):127-152, 

2004. 

[25] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 2006. 

[26] S. Vaiter, G. Peyre, C. Dossal, and J. Fadili. Robust sparse analysis regularization. IEEE 

Trans. Inf. Theory , 59(4):2001-2016, 2013. 

[27] C. R. Vogel and M. E. Oman. Fast, robust total variation-based reconstruction of noisy, blurred 

images. Image Processing, IEEE Transactions on, 7(6):813-824, 1998. 

Appendix A. Continuous path. In the following lemma we show that x c ^ := 
argmin f^(x) (ff is defined in ( |2.2| )) for c constant is a continuous and differentiable 
function of p. 

Lemma A.l. Letc be constant and consider x c ^ as afunctional of p. If condition 


Proof The optimality conditions of problem ( |2.2| ) are 
cV'ip fl (W*x) + A 1 (Ax -b) = 0. 


(1.4) is satisfied, then x C:/1 is continuous and differentiable. 


According to definition of we have 


(v^X^ + ^L) 


cVVv(^Xm) + A j (Ax c ^ -b) = 0 

^ dV'ip /1 (W x c + \ J A d Xc T _ q 
dp dp 

A T A —= 0 
dp 


(cV 2 Vv(WXd + VA) 


dp 

dx, 


dp 


c,p , c dVMVpc) 


dp 


^ iw . x ^ +c ^pi 


= 0 

= 0 , 


where dS/f)^(W*x)/dp\ Xc is the first-order derivative of 'Vf> /1 (W*x) as a functional 
of /i, measured at x cj/x . Notice that due to condition Ker(VL*)DKer(A) = {0} we have 
that V 2 /^(x) is positive definite Vx, hence x c ^ is unique. Therefore, the previous 
system has a unique solution, which means that x c ^ is uniquely differentiable as a 
functional of p with c being constant. Therefore, x c ^ is continuous as a functional of 

p. □ 

remark A.2. Lemma \A.l\ and continuity imply that there exists sufficiently small 
smoothing parameter p such that \\x C j/jL — x c \\2 < oj for any arbitrarily small uo > 0. 
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